Skip to content

fix: add Microsoft OSS compliance boilerplate#4

Merged
colombod merged 1 commit into
mainfrom
fix/compliance-boilerplate-update
May 13, 2026
Merged

fix: add Microsoft OSS compliance boilerplate#4
colombod merged 1 commit into
mainfrom
fix/compliance-boilerplate-update

Conversation

@sadlilas

@sadlilas sadlilas commented May 3, 2026

Copy link
Copy Markdown
Contributor

Adds the standard Microsoft OSS compliance scaffolding to bring this repo in line with the public Amplifier ecosystem standard.

Files added

  • CODE_OF_CONDUCT.md — verbatim from microsoft/amplifier-core
  • SECURITY.md — verbatim from microsoft/amplifier-core
  • SUPPORT.md — verbatim from microsoft/amplifier-core
  • LICENSE — MIT, verbatim from microsoft/amplifier-core

README.md updates

  • Appended ## Contributing section (verbatim from microsoft/amplifier-core)
  • Appended ## Trademarks section (verbatim from microsoft/amplifier-core)

This is part of a coordinated cleanup across all 14 private amplifier-* repos identified by the ecosystem audit on 2026-05-03. The same change is being applied uniformly to each — there is no per-repo customization.

🤖 Generated with Amplifier

Adds CODE_OF_CONDUCT.md, SECURITY.md, SUPPORT.md, LICENSE
and Contributing + Trademarks sections to README.md, copied
verbatim from microsoft/amplifier-core to bring this repo in
line with the public Amplifier ecosystem standard.

Generated by Amplifier ecosystem-audit recipe.
@colombod colombod merged commit b331591 into main May 13, 2026
1 check passed
@colombod colombod deleted the fix/compliance-boilerplate-update branch May 13, 2026 12:17
colombod added a commit that referenced this pull request Jun 18, 2026
…e-visibility signal (v4.0.1) (#15)

* docs: dangling-node reader audit sign-off (#278 Phase 1 gate)

Enumerate every node→edge reader in neo4j_store.py, services.py, and routers/
via the three spec-mandated grep commands. Classify each hit as TOLERANT or
NEEDS-FIX. Confirm get_node (neo4j_store.py:566) and get_edge (neo4j_store.py:601)
are SAFE independent point-lookups.

All 7 grep hits are TOLERANT:
- neo4j_store.py:616  get_edge() Cypher fallback — property-filtered edge lookup,
  not a node→edge walk; no node-existence dependency.
- services.py:70,125-127,135,143 — GraphState in-memory dict operations; write
  paths (70,125-127,143) and direct key lookup (135); none walk node→edge.
- routers/ — zero hits.

NEEDS-FIX count: 0. No code changes required.

Phase-1 gate PASSED. All other Phase-1 tasks may proceed.

Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: add neo4j_flush_chunk_rows/bytes config knobs (#278)

Add two config knobs for sub-transaction chunking in _flush_body:
- neo4j_flush_chunk_rows: int = 100  (cardinality bound)
- neo4j_flush_chunk_bytes: int = 4_194_304  (4 MiB payload bound)

A chunk closes when EITHER bound trips first. Tests verify defaults
via test_neo4j_flush_chunk_rows_default and test_neo4j_flush_chunk_bytes_default.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: thread flush_chunk_rows/bytes into Neo4jGraphStore.__init__ with clamp (#278)

- Add flush_chunk_rows (default 100) and flush_chunk_bytes (default 4_194_304) to __init__ signature
- Store as _flush_chunk_rows / _flush_chunk_bytes with max(1, value) clamp to prevent zero/negative chunks
- Add _make_store_chunked helper and three new tests covering: nominal values, clamping of non-positive inputs, and default values

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: add _serialized_row_size byte estimator (#278)

Measures the JSON-serialized form of a row value (len(json.dumps(v,
default=str))) rather than len() on the dict/list, which would return
the element/key count and be blind to fat nested payloads such as large
messages arrays or context_snapshot dicts.

default=str ensures datetimes and other non-JSON-serialisable values
never raise, falling back to str() length in the unlikely event
json.dumps itself fails.

Tests:
- test_serialized_row_size_uses_serialized_form_not_len: fat dict with
  ~4000-char nested strings yields > 3000 (not 3 as len() would give)
- test_serialized_row_size_handles_unjsonable_value: datetime value
  returns > 0, proving no crash on non-JSON types

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: add _chunk_dict/_chunk_list dual-bound chunk helpers (#278)

- Add _chunk_dict(snapshot, max_rows, max_bytes) generator that yields dict
  chunks bounded by both row count and byte size.
- Add _chunk_list(snapshot, max_rows, max_bytes) generator for list payloads
  with the same dual-bound logic.
- Both helpers implement the one-row floor: a single oversized row is always
  yielded alone, never split, never looped.
- _serialized_row_size() used for byte estimation in both helpers.
- 5 new tests cover: row bound, byte bound, one-row floor, empty input,
  and list variant. No row lost or duplicated.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* test: add capped Neo4j fixture + calibration guard for OOM proof (#278)

Add module-scoped neo4j_container_capped fixture to conftest.py that runs
neo4j:5.26.22-community with NEO4J_db_memory_transaction_max=2m, mirroring
the session-scoped neo4j_container bootstrap logic (random ports, 5-attempt
port-flake retry on APIError 'ports are not available', httpx readiness poll
up to 60s, remove=True, container.stop() teardown).  Cap is set via env at
startup — runtime dbms.setConfigValue does not exist on Community Edition.

Create tests/neo4j/test_oom_regression.py with:
- _OOM_CODE module constant
- _low_retry_store() helper: constructs Neo4jGraphStore, closes original
  30s-retry driver (no leak), swaps in AsyncGraphDatabase.driver with
  max_transaction_retry_time=2.0
- _buffer_fat_nodes() helper: buffers n single-phase node rows with ~blob_bytes
  blob property and UNIQUE prefix-scoped node_ids
- _purge_prefix() helper: DETACH DELETE for nodes under a prefix (order-independent)
- test_calibration_guard_tiny_write_succeeds: buffers one tiny node, flushes
  (must not raise), asserts MATCH count == 1

Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* test: store-level OOM RED on capped container (enormous=OOM, small=drain-red) (#278)

Add two store-level OOM regression tests to test_oom_regression.py:

test_unbounded_single_phase_flush_ooms:
  - Enormous flush_chunk_rows/flush_chunk_bytes (10M rows, 10GB bytes)
  - 400 fat nodes × 20 KB = ~8 MB single-phase payload, 4× over the 2 MiB cap
  - Asserts TransientError with code == Neo.TransientError.General.MemoryPoolOutOfMemoryError
  - Asserts MATCH count == 0 (nothing commits on OOM, buffer restored)

test_chunked_flush_drains_same_single_phase_buffer (RED):
  - Small flush_chunk_rows=50, flush_chunk_bytes=262_144 (256 KB per chunk)
  - Same 400 fat nodes — each chunk is ~50 × 20 KB ≈ 1 MB, 4× UNDER the cap
  - Currently FAILS with TransientError/MemoryPoolOutOfMemoryError because
    _flush_body does not use flush_chunk_rows/flush_chunk_bytes yet
  - GREEN state (after Task 8 fix): flush() must not raise, buffer empty, count == 400

Also adds TransientError import from neo4j.exceptions.

Test run (pre-fix):
  PASSED calibration_guard_tiny_write_succeeds
  PASSED test_unbounded_single_phase_flush_ooms (OOM confirmed, count == 0)
  FAILED test_chunked_flush_drains_same_single_phase_buffer (genuine RED)

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: chunked phased _flush_body coordinator — fixes OOM stall (#278)

Replace the single-transaction _flush_body with a phased, dual-bounded,
per-chunk-committed coordinator that eliminates the MemoryPoolOutOfMemoryError
caused by sending all buffered nodes/edges/patches in one transaction.

Changes:
- _flush_body now iterates each buffer through _chunk_dict/_chunk_list with
  self._flush_chunk_rows / self._flush_chunk_bytes bounds
- Each chunk is committed in its own independent execute_write (separate
  Neo4j session) — no multi-chunk explicit transactions that would
  re-collapse the memory bound
- Phase order: nodes → label patches → edges (preserves referential integrity)
- On any chunk failure: logs flush_chunk_failed + re-raises; finally block
  merges snapshot back into live buffers (full retry on next flush)
- _write_batch is byte-for-byte unchanged

Test results:
- tests/neo4j/test_oom_regression.py: 3/3 passed (calibration guard,
  enormous-bounds OOM cause asserted, small-bounds drains exactly 400 nodes)
- tests/test_neo4j_store.py: 101/101 passed (no regressions)

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* test: coordinator within-bounds wiring guard + empty-buffer guard (#278)

Add two characterization/guard tests for the phased chunked-flush coordinator
(Task 8's _flush_body):

- test_coordinator_every_execute_write_within_bounds: seeds 35 nodes at
  rows=10, captures every execute_write payload, and asserts each node chunk
  satisfies len(nodes)<=10 AND (total_bytes<=10_000_000 OR len==1).

- test_coordinator_empty_buffer_makes_zero_calls: verifies that flush() with
  empty buffers short-circuits before opening a session, so execute_write is
  never called.

Both tests pass against the existing coordinator implementation.  Chunk-size
arithmetic is owned by Task 5 tests and is not re-tested here.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* test: coordinator phase ordering nodes->patches->edges (#278 A.4)

* test: re-raise invariant — 3 cases, full restore, ERROR log (#278 A.5)

Add parametrized test test_reraise_restores_full_snapshot_and_logs covering 3
materially different durable-progress states:
  - first_chunk_fails (index 0): nothing committed to Neo4j
  - later_chunk_same_phase_committed (index 1): first node chunk committed,
    second node chunk fails — partial within node phase
  - edge_after_nodes_committed (index 3): all 3 node chunks committed,
    first edge chunk fails — partial durable progress across phases

In all 3 cases asserts:
  1. RuntimeError('chunk boom') propagates out of flush()
  2. _node_buffer and _edge_buffer fully restored to original snapshot
  3. ERROR log containing 'flush_chunk_failed' is emitted

Also adds helpers:
  - _seq_execute_write_failing_on(call_index): execute_write mock that
    succeeds until call_index then raises RuntimeError('chunk boom')
  - _wire_session(store, execute_write_mock): wires fake session boundary
    (MagicMock cm / __aenter__ / __aexit__) onto store

Hard constraint #4 guard: coordinator must re-raise on any chunk failure
and never return success after a partial flush.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: thread flush-chunk bounds from settings into get_or_create (#278)

Pass flush_chunk_rows and flush_chunk_bytes from Settings into the
Neo4jGraphStore constructor in get_or_create. Settings is already
fetched at the top of get_or_create (line 434); the new fields reuse
the same settings binding without a second get_settings() call.

Also extend the _SettingsProxy in tests/conftest.py to expose the two
new fields so the autouse safe_settings fixture does not cause
AttributeError when the registry path exercises them in tests.

Test: test_get_or_create_threads_flush_chunk_bounds monkeypatches
Neo4jGraphStore and start_drain, constructs SessionRegistry, calls
get_or_create, and asserts flush_chunk_rows==100 and
flush_chunk_bytes==4_194_304 (the Settings defaults).

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* ops: set finite db.memory.transaction.max deployment cap (#278)

* test: live finalization-path freeze->restart->drain arc, OOM cause asserted (#278)

Three-leg integration test covering the _finalize_session failure path
that manifests as frozen offsets across restarts (issue #278):

Leg 1 OLD-FREEZE: rows=10_000_000 / byts=10_000_000_000 forces all ~201
  fat nodes (~8 MB total) into a single transaction, hitting the 2 MiB
  per-transaction cap.  Asserts:
  - 'finalize_tail_flush_failed' logged
  - _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError')
    positively present in caplog (not a proxy assertion)
  - Worker stays registered (finalize returned early without cleanup)
  - Committed offset frozen at 0
  - 0 f-* ToolCall nodes committed

Leg 2 RESTART: fresh SessionRegistry + SessionWorker over the same
  on-disk queue with the same old bounds.  Confirms the freeze survives
  a process restart (offset still 0, 0 committed).

Leg 3 RESTART FIXED: rows=50 / byts=262_144 keeps each chunk ~250 KB
  well under the 2 MiB cap.  Asserts:
  - Offset advances to tail_end (full queue drained)
  - Worker deregistered on successful finalization
  - 100 f-* ToolCall nodes committed to Neo4j

Queue seeding: 100 tool:pre events each carrying tool_call_id='f-{i}'
  and tool_input='x'*40_000 (fat ToolCall + Event nodes), plus one
  session:end event.  Also adds _line() helper and top-level imports
  (json, logging, Path) to the module.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* test: live cross-chunk referential integrity + large-buffer happy path (#278)

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: add SessionWorker.last_successful_flush liveness field (#278)

Add last_successful_flush: float = field(default_factory=time.time) to
the SessionWorker @DataClass in registry.py.

The field defaults to the worker's creation time (NOT 0.0) so a
brand-new worker reads as fresh, not ancient. A 0.0 default would make
every new worker appear to have last flushed in 1970. Defaulting to
creation time means 'no flush has happened yet, but the worker is
fresh.' The field will be stamped in _flush_barrier in a subsequent
task.

TDD: test TestLastSuccessfulFlushField::test_defaults_to_creation_time_not_zero
first failed with AttributeError ('SessionWorker' object has no attribute
'last_successful_flush'), then passed after the field was added.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: stamp last_successful_flush once at the _flush_barrier boundary (#278)

Phase 2 (#278): after awaited flush succeeds inside _flush_barrier, stamp
worker.last_successful_flush = time.time().

All three drainer success paths (drain, exhausted-per-line, finalize) funnel
through _flush_barrier, so this single stamp covers all of them. A separate
stamp at each call site would be redundant and drift-prone.

Test: TestFlushBarrierStampsLiveness.test_flush_barrier_advances_last_successful_flush
- Forces worker.last_successful_flush = 0.0 before call
- Asserts value >= before after _flush_barrier returns
- Confirmed FAIL before fix (stays 0.0), PASS after

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: add SessionRegistry.orphaned_sessions() predicate (#278)

A worker is orphaned iff it is still registered in _workers AND its
asyncio task has completed (task.done()). This catches the
finalization-path orphan (tail flush returns early without
deregistering, so the task completes but the worker remains) and any
unhandled exception that escapes the drain loop.

Deterministic and instant — no timer, no threshold.

Three tests added (TestOrphanedSessions):
- test_completed_task_worker_is_orphaned: done task → reported
- test_live_task_worker_is_not_orphaned: running task → not reported
- test_no_task_worker_is_not_orphaned: task=None → not reported

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* feat: surface orphaned + last_successful_flush on status response (#278)

- Compute orphaned_ids set once from registry.orphaned_sessions() (single
  source of truth — no inline task.done() calls scattered through dict comp)
- Add orphaned (bool) and last_successful_flush (float) to each per-session
  dict in build_status_response
- Add top-level orphaned_sessions count (aggregate, safe for unauthenticated
  /status endpoint — no per-session error strings leaked)
- Tests: TestBuildStatusResponseOrphanVisibility — 3 tests covering
  done-task → orphaned=True + count, running-task → orphaned=False + count=0,
  and last_successful_flush presence/value

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* test: live E2E — real finalization orphan surfaces on /status (#278)

Adds tests/neo4j/test_orphan_visibility.py: a module-scoped live E2E test
(pytestmark = pytest.mark.neo4j) proving a genuine finalization-path orphan
surfaces on /status.  The test drives the real start_drain → drain_worker →
_finalize_session path — worker.task is an actual asyncio.Task that
transitions to done().

Reproduction recipe (deterministic, ~29s against neo4j:5.26.22-community):

Seed shape (line counts are load-bearing):
  Lines   1-99:  tiny tool:pre  (tool_input 'x'*16,     key space small-{i})
  Line   100:    session:end    (terminal; exactly fills read_batch max_items=100)
  Lines 101-200: fat  tool:pre  (tool_input 'x'*40_000, key space f-{i}, ~8 MB)

WHY:
  The pre-terminal block is exactly 100 lines so the drainer's first read_batch
  returns only those, commits cleanly (tiny flush << 2 MiB cap), sets
  saw_terminal → _finalize_session.  The finalization tail (100 fat lines) is
  flushed in ONE transaction (rows=10_000_000, byts=10_000_000_000) → OOM.
  _finalize_session does NOT retry; one OOM → finalize_tail_flush_failed log
  + early return → orphan (registered worker, completed task).

Orphan post-state assertions (all required, none weakened):
  - worker.task.done() True
  - sid still in registry._workers (not deregistered)
  - 'finalize_tail_flush_failed' in caplog.text
  - _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError') in caplog.text
  - committed offset frozen at pre-terminal boundary (== boundary, != tail_end)
  - 0 f-* nodes committed (queried via a FRESH check_store)
  - worker in registry.orphaned_sessions()
  - build_status_response reports orphaned_sessions >= 1 and
    the session's per-session dict has orphaned: True

Teardown closes the still-open store driver (early-returned _finalize_session
did not call _safe_close).  No production code changes.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* chore: bump version 4.0.0 -> 4.0.1 (#278 Phase 1 + Phase 2 release marker)

- Add TestVersionIs401.test_pyproject_version_is_401 that reads pyproject.toml
  via tomllib and asserts version == '4.0.1' (single source of truth gate)
- Bump pyproject.toml line 7: version = "4.0.0" -> version = "4.0.1"

Test cycle confirmed:
  RED:  assert '4.0.0' == '4.0.1'  (AssertionError before bump)
  GREEN: 1 passed                   (after bump)

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* docs/refactor: apply pre-merge review fixes (#278)

Three issues from the holistic code review, all cosmetic/documentary:

1. dashboard.py build_status_response docstring — add orphaned_sessions to
   the Returns key list; document the orphaned_sessions count vs visible-
   session asymmetry (aged-out orphan contributes to count but won't appear
   with orphaned:True in sessions list); add orphaned and last_successful_flush
   to the sessions-dict key list.

2. test_orphan_visibility.py — rename test function to align with the spec's
   acceptance-criteria reference:
     test_real_drain_orphan_surfaces_on_status
     -> test_finalization_orphan_surfaces_on_status
   Resolves the Task 7 DONE_WITH_CONCERNS naming discrepancy.

3. test_version.py — rename TestVersionIs401 -> TestVersionIs4_0_1 and
   test_pyproject_version_is_401 -> test_pyproject_version_is_4_0_1 to
   eliminate the HTTP-401-status-code ambiguity in the class name.

Note: review recommendation #2 (strengthen flush-value assertion) was
already implemented — tests/test_dashboard.py line 492-493 already carries
both the key-presence check and the value-equality check.

Non-Neo4j suite: 1343 passed, 2 skipped (no regressions).

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* chore: drop superpowers reader-audit doc from product docs/ (#278)

The reader audit conclusion (zero dangling-node readers in the codebase) is captured in the PR description. Superpowers-generated docs belong outside the product documentation tree per repo conventions.

Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

---------

Co-authored-by: Amplifier <amplifier@example.com>
Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants